先前的工作表明,使用顺序学习者学习面部不同组成部分的顺序可以在面部表达识别系统的性能中发挥重要作用。我们提出了Facetoponet,这是面部表达识别的端到端深层模型,它能够学习面部有效的树拓扑。然后,我们的模型遍历学习的树以生成序列,然后将其用于形成嵌入以喂养顺序学习者。设计的模型采用一个流进行学习结构,并为学习纹理提供一个流。结构流着重于面部地标的位置,而纹理流的主要重点是在地标周围的斑块上学习纹理信息。然后,我们通过利用有效的基于注意力的融合策略来融合两个流的输出。我们对四个大型内部面部表达数据集进行了广泛的实验 - 即Alltionnet,FER2013,ExpW和RAF-DB,以及一个实验室控制的数据集(CK+)来评估我们的方法。 Facetoponet在五个数据集中的三个数据集中达到了最新的性能,并在其他两个数据集中获得了竞争结果。我们还执行严格的消融和灵敏度实验,以评估模型中不同组件和参数的影响。最后,我们执行鲁棒性实验,并证明与该地区其他领先方法相比,Facetoponet对阻塞更具稳健性。
translated by 谷歌翻译
我们提出了一种用于面部表情识别的端到端架构。我们的模型了解面部地标的最佳树拓扑,其遍历生成一条序列,我们从中获取嵌入以馈送顺序学习者。该拟议的架构包含两个主要的流,一个主要用于学习脸部的结构,而另一个侧重于地标周围的贴片以学习纹理信息。然后,每个流都是注意机制,并且输出被馈送到两流融合组件以执行最终分类。我们对两种大型公共面部表情数据集,CheftNET和FER2013进行了广泛的实验,以评估我们的方法的功效。我们的方法优于该区域中的其他解决方案,并在这些数据集上设置新的最先进的表达式识别率。
translated by 谷歌翻译
The main objective of Prognostics and Health Management is to estimate the Remaining Useful Lifetime (RUL), namely, the time that a system or a piece of equipment is still in working order before starting to function incorrectly. In recent years, numerous machine learning algorithms have been proposed for RUL estimation, mainly focusing on providing more accurate RUL predictions. However, there are many sources of uncertainty in the problem, such as inherent randomness of systems failure, lack of knowledge regarding their future states, and inaccuracy of the underlying predictive models, making it infeasible to predict the RULs precisely. Hence, it is of utmost importance to quantify the uncertainty alongside the RUL predictions. In this work, we investigate the conformal prediction (CP) framework that represents uncertainty by predicting sets of possible values for the target variable (intervals in the case of RUL) instead of making point predictions. Under very mild technical assumptions, CP formally guarantees that the actual value (true RUL) is covered by the predicted set with a degree of certainty that can be prespecified. We study three CP algorithms to conformalize any single-point RUL predictor and turn it into a valid interval predictor. Finally, we conformalize two single-point RUL predictors, deep convolutional neural networks and gradient boosting, and illustrate their performance on the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) data sets.
translated by 谷歌翻译
With the advent of deep learning application on edge devices, researchers actively try to optimize their deployments on low-power and restricted memory devices. There are established compression method such as quantization, pruning, and architecture search that leverage commodity hardware. Apart from conventional compression algorithms, one may redesign the operations of deep learning models that lead to more efficient implementation. To this end, we propose EuclidNet, a compression method, designed to be implemented on hardware which replaces multiplication, $xw$, with Euclidean distance $(x-w)^2$. We show that EuclidNet is aligned with matrix multiplication and it can be used as a measure of similarity in case of convolutional layers. Furthermore, we show that under various transformations and noise scenarios, EuclidNet exhibits the same performance compared to the deep learning models designed with multiplication operations.
translated by 谷歌翻译
Performance metrics-driven context caching has a profound impact on throughput and response time in distributed context management systems for real-time context queries. This paper proposes a reinforcement learning based approach to adaptively cache context with the objective of minimizing the cost incurred by context management systems in responding to context queries. Our novel algorithms enable context queries and sub-queries to reuse and repurpose cached context in an efficient manner. This approach is distinctive to traditional data caching approaches by three main features. First, we make selective context cache admissions using no prior knowledge of the context, or the context query load. Secondly, we develop and incorporate innovative heuristic models to calculate expected performance of caching an item when making the decisions. Thirdly, our strategy defines a time-aware continuous cache action space. We present two reinforcement learning agents, a value function estimating actor-critic agent and a policy search agent using deep deterministic policy gradient method. The paper also proposes adaptive policies such as eviction and cache memory scaling to complement our objective. Our method is evaluated using a synthetically generated load of context sub-queries and a synthetic data set inspired from real world data and query samples. We further investigate optimal adaptive caching configurations under different settings. This paper presents, compares, and discusses our findings that the proposed selective caching methods reach short- and long-term cost- and performance-efficiency. The paper demonstrates that the proposed methods outperform other modes of context management such as redirector mode, and database mode, and cache all policy by up to 60% in cost efficiency.
translated by 谷歌翻译
We propose a framework in which multiple entities collaborate to build a machine learning model while preserving privacy of their data. The approach utilizes feature embeddings from shared/per-entity feature extractors transforming data into a feature space for cooperation between entities. We propose two specific methods and compare them with a baseline method. In Shared Feature Extractor (SFE) Learning, the entities use a shared feature extractor to compute feature embeddings of samples. In Locally Trained Feature Extractor (LTFE) Learning, each entity uses a separate feature extractor and models are trained using concatenated features from all entities. As a baseline, in Cooperatively Trained Feature Extractor (CTFE) Learning, the entities train models by sharing raw data. Secure multi-party algorithms are utilized to train models without revealing data or features in plain text. We investigate the trade-offs among SFE, LTFE, and CTFE in regard to performance, privacy leakage (using an off-the-shelf membership inference attack), and computational cost. LTFE provides the most privacy, followed by SFE, and then CTFE. Computational cost is lowest for SFE and the relative speed of CTFE and LTFE depends on network architecture. CTFE and LTFE provide the best accuracy. We use MNIST, a synthetic dataset, and a credit card fraud detection dataset for evaluations.
translated by 谷歌翻译
This work presents an actuation framework for a bioinspired flapping drone called Aerobat. This drone, capable of producing dynamically versatile wing conformations, possesses 14 body joints and is tail-less. Therefore, in our robot, unlike mainstream flapping wing designs that are open-loop stable and have no pronounced morphing characteristics, the actuation, and closed-loop feedback design can pose significant challenges. We propose a framework based on integrating mechanical intelligence and control. In this design framework, small adjustments led by several tiny low-power actuators called primers can yield significant flight control roles owing to the robot's computational structures. Since they are incredibly lightweight, the system can host the primers in large numbers. In this work, we aim to show the feasibility of joint's motion regulation in Aerobat's untethered flights.
translated by 谷歌翻译
Flying animals, such as bats, fly through their fluidic environment as they create air jets and form wake structures downstream of their flight path. Bats, in particular, dynamically morph their highly flexible and dexterous armwing to manipulate their fluidic environment which is key to their agility and flight efficiency. This paper presents the theoretical and numerical analysis of the wake-structure-based gait design inspired by bat flight for flapping robots using the notion of reduced-order models and unsteady aerodynamic model incorporating Wagner function. The objective of this paper is to introduce the notion of gait design for flapping robots by systematically searching the design space in the context of optimization. The solution found using our gait design framework was used to design and test a flapping robot.
translated by 谷歌翻译
In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries. REVEAL consists of four key components: the memory, the encoder, the retriever and the generator. The large-scale memory encodes various sources of multimodal world knowledge (e.g. image-text pairs, question answering pairs, knowledge graph triplets, etc) via a unified encoder. The retriever finds the most relevant knowledge entries in the memory, and the generator fuses the retrieved knowledge with the input query to produce the output. A key novelty in our approach is that the memory, encoder, retriever and generator are all pre-trained end-to-end on a massive amount of data. Furthermore, our approach can use a diverse set of multimodal knowledge sources, which is shown to result in significant gains. We show that REVEAL achieves state-of-the-art results on visual question answering and image captioning.
translated by 谷歌翻译
In recent decades, several assistive technologies for visually impaired and blind (VIB) people have been developed to improve their ability to navigate independently and safely. At the same time, simultaneous localization and mapping (SLAM) techniques have become sufficiently robust and efficient to be adopted in the development of assistive technologies. In this paper, we first report the results of an anonymous survey conducted with VIB people to understand their experience and needs; we focus on digital assistive technologies that help them with indoor and outdoor navigation. Then, we present a literature review of assistive technologies based on SLAM. We discuss proposed approaches and indicate their pros and cons. We conclude by presenting future opportunities and challenges in this domain.
translated by 谷歌翻译